Features for segmenting and classifying long-duration recordings of "personal" audio

نویسندگان

Daniel P. W. Ellis

Keansub Lee

چکیده

A digital recorder weighing ounces and able to record for more than ten hours can be bought for a few hundred dollars. Such devices make possible continuous recordings of “personal audio” – storing essentially everything heard by the owner. Without automatic indexing, however, such recordings are almost useless. In this paper, we describe some experiments with recordings of this kind, focusing on the problem of segmenting the recordings into different ‘episodes’ corresponding to different acoustic environments experienced by the device. We describe several novel features to describe 1-minute-long frames of audio, and investigate their effectiveness at reproducing hand-labeled ground-truth segment boundaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forced alignment for speech synthesis databases using duration and prosodic phrase breaks

Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break ...

متن کامل

Automatically segmenting and clustering minimal-impact personal audio archives

To capture essentially everything that you hear takes little more than a $100 MP3 player with a built-in microphone; a year’s worth of recordings is maybe 60 GB, or a small stack of writable DVDs. We have been collecting this kind of ‘personal audio’ on and off for a couple of years, and experimenting with methods to index and access the resulting data. Audio archives have several distinctive f...

متن کامل

Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation

Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecolo...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

Voice pathology detection and classification using MPEG-7 audio low-level features

In this paper, a new pathological voice detection and pathology classification method based on MPEG-7 audio lowlevel features is proposed. MPEG-7 features are originally used for multimedia indexing, which includes both video and audio. Indexing is related to event detection, and as pathological voice is a separate event than normal voice, we show that MPEG-7 audio low-level features can do ver...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Features for segmenting and classifying long-duration recordings of "personal" audio

نویسندگان

چکیده

منابع مشابه

Forced alignment for speech synthesis databases using duration and prosodic phrase breaks

Automatically segmenting and clustering minimal-impact personal audio archives

Revealing the ecological content of long-duration audio-recordings of the environment through clustering and visualisation

Code-Copying in the Balochi Language of Sistan

Voice pathology detection and classification using MPEG-7 audio low-level features

عنوان ژورنال:

اشتراک گذاری